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Abstract 

A few key properties of the World- Wide- Web (WWW) has been established indicating 
the lack of any characteristic scales for the WWW, both in its topology and in its dynamics. 
Here, we report an experiment which quantifies another power law describing the dynam- 
ical response of the WWW to a Dirac-like perturbation, specifically how the popularity of 
a web site evolves and relaxes as a function of time, in response to the publication of a 
notice/advertisement in a newspaper. Following the publication of an interview of the au- 
thors by a journalist which contained our URL, we monitored the rate of downloads of our 
papers and found it to obey a l/t b power law with exponent b = 0.58 ± 0.03. This small 
exponent implies long-term memory and can be rationalized using the concept of persistence, 
which specifies how long a relaxing dynamical system remains in a neighborhood of its initial 
configuration. 



It is generally accepted that the World- Wide- Web (WWW) provides one of the most efficient 
methods for retrieving information. However, little is known about how information actually 
flows through the WWW and even less on how the WWW interacts with other types of media. 
Most studies have until now focused on statistical properties of the WWW and the people surfing 
on it, the "internauts" . A few key properties has been established indicating the lack of any 
characteristic scales for the WWW JD] : (i) the distribution of the number of pages per site is an 
approximate power law ||; (ii) the distributions of outgoing (Uniform Resource Locator or URLs 
found on an HTML document) and incoming (URLs pointing to a certain HTML document) 
links are well-described by a universal power law which seems independent of the search engine 
|||; (hi) the distribution of independent hits or users per web-site also seems to follow a power 
law and the ranking of sites according to their popularity is well-described by Zipf's law 
(iv) the distribution of waiting times to access a given page is also a power law distribution || |?]] 
and the correlation function of the WWW traffic intensity as a function of time also exhibits a 
slow power law decay ||. 

These properties are believed to reflect the evolutionary self-organizing dynamics of the 
WWW, which is not well-understood and the subject of active research [jy]. The WWW provides 
in particular a very interesting proxy of a fast evolving ecology of heterogeneous agents in which 
several different times scales appear ranging from the largest time scale corresponding to a 
significant evolution of the web network (months to years), the response adjustment time of 
agents to network evolution or to novel information (hours to months) to the access times 
(seconds to minutes) of single WWW pages. 

Here, we report an experiment which probes a property belonging to the intermediate time 
scale. Specifically, we quantify how the popularity of a web site evolves and relaxes as a function 
of time, in response to the publication of a notice/advertisement in a newspaper. The authors 
were interviewed by a journalist from the Danish newspaper JyllandsPosten on a subject of rather 
broad and catchy interest, namely stock market crashes. The interview was published on the 14 
April 1999 in both the paper version of the newspaper as well as in the electronic version (with 
access restricted to subscribers) and included the URLs where the authors' research papers on 
the subject could be retrieved. Specifically, the URLs were the search engine of the Los Alamos 
preprint server and the URL of the first author's home-page at the Niels Bohr Institute's web- 
site. Naturally, we had no means of monitoring the downloads from the Los Alamos preprint 
server. However, all WWW-activity on the Niels Bohr Institute's web-site is continuously logged 
and kept for record. It was hence possible to monitor the number of downloads of papers as a 
function of time. 

Since the interview was published in Danish, the experiment only probes a small fraction 
of the internauts, namely those capable of reading Danish, thus essentially people of Danish, 
Icelandic, Norvegian and Swedish origin and their immediate surroundings. The results reported 
below have not been reproduced as the "impact" by the publication of the interview provides a 
rather unique opportunity to monitor in real time the dynamics of information spreading and 
persistence. The statistical significance can thus be improved in principle by repeating this 
experiment several times. 

In figure (|l|), we show the cumulative number of downloads N as a function of time t since 
the publication of the interview. Only downloads of papers already posted on the home-page 
at the time of the publication of the interview has been included in the count in order to keep 
the experiment as well-defined as possible. The error-bars are taken as the square-root of the 
number. We see that the data is surprisingly well-captured over two decades by the relation 

N O = -^t l ~ b + ct > W 



corresponding to a download rate dN(t)/dt = at~ b + c giving the number of downloads per unit 
time at a time t after the publication of the interview. The constant background rate c takes 
into account downloads from people unaware of the interview as well as robots. The best fit 
parameters are a = 23.1 ± 0.5 days -1 , b 0.58 ± 0.03 and c « 0.76 ± 0.31 days -1 , over a total 
time interval of 100 days. Expression (||) thus establishes a novel self-similar relationship for 
the dynamical behavior on the WWW, describing the slow relaxation of the system after an 
essentially Dirac-like excitation. The coefficient a controls the absolute number of downloads 
per unit time and is thus not universal. It reflects the size of the internaut population which 
is concerned by the experiment. Similarly, the coefficient c controls the background rate and 
depends on 1) how easily the page can be found and 2) the general interest of the subjects posted 
on the page. 

The finding that the relaxation exponent b is less than one has an important consequence, 
namely non-stationarity and "aging" in the technical sense of a breaking of ergodicity. Consider 
N successive downloads separated in time by Ati,i = 1, ...,N, where Ati + At2 + ... + At^ = 
t = N(At). The distribution of downloads time intervals At is a power law 1/ At 1+X , where x is 
determined from the fact that 

Since the maximum At max among N trials is typically given by N J^ t T fi+ X ~ 1, we have 

At max ~ N*. Thus t = N(At) ~ Nx giving N ~ t x , for x < 1. We can thus identify the 
exponent x with 1 — 6 and thus find that the distribution of waiting times between successive 
downloads is a power law with an exponent b ~ 0.58 less than one. One can then show that this 
power law distribution of time intervals between downloads implies that the longer since the last 
download, the longer the expected time till the next one || . In other words, any expectation of 
a download that is estimated today depends on the past in a manner which does not decay. This 
is a hallmark of "aging" . The mechanism is similar to the "weak breaking of ergodicity" in spin 
glasses that occurs when the exponent x of the distribution of trapping times in meta-stable 
states is less than one [^] . 

How can we rationalize this relation (||)? We propose the following very naive but illustrative 
model: think of the population of internauts as subjected to the influence of the newspaper pub- 
lication that may trigger an activity (downloading from our site). Let us think of this influence 
as a field that diffuse and spread dynamically in the complex space network of internauts and in 
their mind. This diffusive field captures the dynamics of information, rumor spreading, psycho- 
logical decision and so on. Let us assume that the decision to act and download from our site is 
triggered when the influence field reaches a threshold. Then the rate dN(t)/dt is proportional to 
the probability for the field not to have reached the threshold, i.e. to the probability to remain in 
the neighborhood of its initial state. This problem falls in the class of the so-called "persistence 
phenomenon" discovered in a large variety of systems jll|, O], and which specifies how long a 
relaxing dynamical system remains in a neighborhood of its initial configuration. For a Gaussian 
process, the persistence exponent x can be shown to be a functional of the two-point temporal 
correlator [11, 12, 13]. For Markovian or weakly non-Mar kovian random walk processes, the 



exponent x and therefore b is close to 1/2, as we find empirically. 

Figure (0) shows the residue obtained by subtracting the formula (|l|) from the data points. 
Figure (||) shows that the spectrum of this residue is sharply peaked on a characteristic frequency 
corresponding exactly to a period of one week. Since the publication date of 14 April was a 
Wednesday, the dips shown in figure (§) corresponds to Weekends: Apparently, most people 



probed by the experiments still mainly have Internet and printer access through their job, 
which explains the low activity during Weekends and the weekly periodicity. 
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Figure 1: Cumulative number of downloads N as a function of time t from the appearance of 
the interview on Wednesday the 14 April 1999. The fit is N(t) = j^i 1-6 + ct with b « 0.58 . 
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Figure 2: Residue obtained by subtracting the fit shown in figure [T] from the data points. 




Figure 3: Spectrum of data in figure [2] showing a weekly periodicity. 



